Efficient Name Variation Detection
نویسنده
چکیده
Semantic integration, link analysis and other forms of evidence detection often require recognition of multiple occurrences of a single name. However, names frequently occur in orthographic variations resulting from phonetic variations and transcription errors. The computational expense of similarity assessment algorithms usually precludes application to all pairs of strings. Instead, it is typically necessary to use a high-recall, lowprecision index to retrieve a smaller set of candidate matches to which the similarity assessment algorithm is then applied. This paper describes five algorithms for efficient candidate retrieval: Burkhart-Keller trees (BKT); filtered Burkhart-Keller trees (FBKT); partition filtering; ngrams; and Soundex. An empirical evaluation showed that no single algorithm performed best under all circumstances. When the source of name variations was purely orthographic, partition filtering generally performed best. When similarity assessment was based on phonetic similarity and the phonetic model was available, BKT and FBKT performed best. When the pronunciation model was unavailable, Soundex was best for k=0 (homonyms), and partition filtering or BKT were best for k>0. Unfortunately, the high-recall retrieval algorithms were multiple orders of magnitude more costly than the low-recall algorithms.
منابع مشابه
A crack localization method for beams via an efficient static data based indicator
In this paper, a crack localization method for Euler-Bernoulli beams via an efficient static data based indicator is proposed. The crack in beams is simulated here using a triangular variation in the stiffness. Static responses of a beam are obtained by the finite element modeling. In order to reduce the computational cost of damage detection method, the beam deflection is fitted through a poly...
متن کاملA New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملCompressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard
Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...
متن کاملRanking efficient DMUs using the variation coefficient of weights in DEA
One of the difficulties of Data Envelopment Analysis(DEA) is the problem of deciency discriminationamong efficient Decision Making Units(DMUs) and hence, yielding large number of DMUs as efficientones. The main purpose of this paper is to overcome this inability. One of the methods for rankingefficient DMUs is minimizing the Coefficient of Variation (CV) for inputs-outputs weights. In this pape...
متن کاملFault Detection Method on a Compressor Rotor Using the Phase Variation of the Vibration Signal
The aim of this work is the application of the phase variation in vibration signal for fault detection on rotating machines. The vibration signal from the machine is modulated in amplitude and phase around a carrier frequency. The modulating signal in phase is determined after the Hilbert transform and is used, with the Fast Fourier Transform, to extract the harmonics spectrum in phase. This me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006